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In this work, wc provide a detailed theoretical analysis, supported by numerical tests, of the reliability of 
the adaptive resolution simulation (AdResS) technique in sampling the Grand Canonical ensemble. We 
demonstrate that the correct density and radial distribution functions in the hybrid region, where molecules 
change resolution, are two necessary conditions for considering the atomistic and coarse-grained regions in 
AdResS equivalent to subsystems of a full atomistic system with an accuracy up to the second order with 
respect to the probability distribution of the system. Moreover, we show that the work done by the thermostat 
and a thermodynamic force in the transition region is formally equivalent to balance the chemical potential 
difference between the different resolutions. From these results follows the main conclusion that the atomistic 
region exchanges molecules with the coarse-grained region in a Grand Canonical fashion with an accuracy up to 
(at least) second order. Numerical tests, for the relevant case of liquid water at ambient conditions, are carried 
out to strengthen the conclusions of the theoretical analysis. Finally, in order to show the computational 
convenience of AdResS as a Grand Canonical set up, we compare our method to the Insertion Particle 
Method (IMP) in its most efficient computational implementation. This fruitful combination of theoretical 
principles and numerical evidence candidates the adaptive resolution technique as a natural, general and 
efficient protocol for Grand Canonical Molecular Dynamics for the case of large systems. 



I. INTRODUCTION 

The Adaptive Resolution Simulation (AdResS) scheme^^ is a method that in a concurrent fashion simulates a 
molecular system treated with different resolutions in different regions of space. The word "resolution" here refers to 
the amount of details included in a molecular model: the higher resolution corresponds to a more precise model, but 
at higher computational costs. The advantage is obvious, it keeps track of both: the local fine-grained processes in 
the high-resolution regions and the large scale behavior, requiring a less demanding computational effort compared to 
the system that, as a whole, would be described by the high-resolution model. The key feature of AdResS is that it 
allows for an on-the-fly change of resolution when a molecule travels from a high-resolution region to a low-resolution 
region and vice versa. Moreover recent research^ has numerically demonstrated that different regions (with different 
resolutions) reach a thermodynamic equilibrium as if the whole system were equilibrated under the high-resolution 
description. Therefore one is tempted to state that a region with a certain resolution exchanges molecules with the rest 
of the system in a Grand Canonical fashion. This work studies the minimal necessary conditions, such that AdResS 
performs effective Grand Canonical simulations. Moreover we shed light on the level accuracy of the sampling in a 
Grand Canonical fashion; it will be shown that indeed the sampling can be considered of the Grand Canonical type 
if we accept an accuracy on the probability distribution of the system up to the second order. Though the second 
order is numerically satisfying, higher level of accuracy can be systematically reached by requiring to match, in the 
transition region, higher orders of the probability distribution. 

From the numerical point of view, for systems which are large enough, a (relatively) small subsystem of a full 
atomistic region is a natural Grand Canonical ensemble, for this reason the approach we use here is that of showing 
the equivalence between the atomistic region of AdResS and the same region in a full atomistic simulation. Extensive 
numerical tests, for the relevant case of liquid water at ambient conditions, are presented in order to show that the 
hypothesis done within the theoretical analysis are justified from the numerical point of view. Beyond our expectations, 
we have found that in the atomistic region of AdResS not only the accuracy can be tuned up to the second order, 
but, without any additional correction, even the three body correlation function of molecular centers of mass (COM) 
turns out to be the same as that of a full atomistic simulation. Finally, a further numerical test was carried out: the 
coarse-grained molecules are substituted by a liquid of spheres interacting via the Weeks-Chandler- Andersen (WCA) 
potential^ which has no reference to the atomistic resolution. We show that also in this case in the atomistic region, 
due to the work of the filter of the transition region, an accuracy up to the third order in the probability distribution is 
achieved. The results of this work allow us to make a step further in the possibility of employing the AdResS scheme 
as a natural, general numerical protocol for truly Grand Canonical Molecular Dynamics simulations independently 
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from the nature of the particles acting as a reservoir in the coarse-grained region. The word "natural" refers to the 
fact that strictly speaking in statistical mechanics the Grand Canonical ensemble is defined in operative terms as 
a subsystem of an infinitely large system. Of course in simulation systems are never infinitely large, however they 
can be large enough to numerically satisfy this condition. In this sense, in our approach molecules are exchanged 
between the atomistic domain and the coarse-grained reservoir in a straightforward (i.e. "natural") dynamical way. 
The advantage compared to previous Grand Canonical schemes for molecular dynamics is that the proper exchange of 
particles requires neither the knowledge or the calculation of the chemical potential, nor additional expensive insertion 
or removing of particles^"—; we actually compare the computational costs of our method to those of the IPM and 
show that for dense liquid systems AdResS is more efficient. The paper is organized as follows: first we briefly describe 
the essential principles on which AdResS is based, then we discuss the approximations under which one can write 
the probability distribution of the system in terms of Grand Canonical partition function. Next we show that the 
work of the thermostat plus that of the thermodynamic force balances the difference in chemical potential; this idea 
allows us to derive the necessary conditions of simulation to reach the accuracy in terms of orders of the probability 
distribution of the system. Finally the numerical results and the efficiency of the method are discussed. 



II. A BRIEF DESCRIPTION OF THE ADRESS SCHEME 



In this work, the higher resolution refers to the atomistic description (AT) of a molecule, while the lower resolution 
refers to its corresponding coarse-grained (CG) model (for a pictorial representation of a typical AdResS set up, 
see FigU] for the case of liquid water). We assume that the dynamics of the system is subject to the Langcvin 
equation, which is the thermostat usually used in the AdResS simulations. We denote the phase space variable x ~ 
(ri, ■ • • , r^r, Vi, - ■ ■ , vn); ri, Vi are the center-of-mass degrees of freedoms (DOFs) of the molecules. For simplicity we 
do not explicitly write the atomistic DOFs, but one should keep in mind that the interaction between two molecules 
with atomistic resolution are given by the sum of all pair-wise atomic interactions. Also for the sake of simplicity, 
we do not explicitly distinguish between the COM and the atomistic coordinates, using a generalized formalism 
where the coordinates are indicated simply as r^, however the interpretation of such a notation is made clear by the 
specific context. If the system is conservative, then one obtains the equilibrium density distribution p{x) oc e~^'^^'^\ 
where /3 = l/fc^T is the inverse temperature. In this paper, we assume that the kinetic part of the Hamiltonian is 
decoupled from the configurational part, then it is usually more convenient to consider the configurational probability 
distribution, namely p(ri, • • • , rjv) = Jp(ri, • ■ • , r^v, '^i, ■ ■ • , i'tv) dui, • • • ,dvN. Throughout the paper, when we 
mention the i-th order of a multi-body configurational probability distribution, e.g. p(ri, • ■ • , r^r), we mean its i-th 
marginal distribution, which is defined by p'-'^(ri, • • • , r.i) = J p{ri, • • • , r^, r^+i, • • ■ , rjv) dr^+i • • • dr^. It is obvious 
that whenever we have the z-th order accuracy of a probability distribution, any lower order is automatically accurate. 
In the AdResS scheme, different resolutions in the system are described by a weighting function w(r). Usually the 
higher resolutions is denoted hy w = 1, while the lower resolution is denoted by w = 0. Between the higher and lower 
resolutions, a hybrid region allows a molecule to have both (interpolated) resolutions. The weighting function changes 
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smoothly from to 1; one possible form of sueh a funetion is: 

1 x{r) < 

""^"^^ ^ ^ cos2 [^^^(xir) - r,)] r, < < 
dA< x(r). 

Where x('") is the distanee between the molecule at r and the boundary of the higher resolution region, x('") < 
means that the molecule is in the high resolution region, c^a is the thickness of the hybrid region, rc is the cut-off 
radius of the atomistic interactions. The intermolecular force is modeled by the following interpolation formulap^: 

^ w, WjFff + {1^ w, wj )Fff^ + Wj{l~w,Wj)F':^^. (2) 

Where wi = w{ri) and Wj ~ w{rj). Ff^ , F'^^ are the intermolecular interaction of the atomistic and coarse-grained 
resolutions, respectively. F'^^^ is a force that corrects the COM-COM radial distribution function (RDF) in the 
transition region A—. A further one body force, named thermodynamic force, J'th, is applied to ensure the correct 
thermodynamic equilibrium of the system (for specific details, see Ref. 0): 

i^, = +i^th(r,), (3) 

j 

which is defined by 

PAT + Pal Fth{r)(ir = pcG, (4) 
J A 

where pat is the pressure of the atomistic resolution, pcG that of the coarse-grained resolution, po is the equilibrium 
number density, corresponding to that of a full atomistic simulation. Such a thermodynamic force has been derived 
by empirical considerations regarding the equilibrium of open systems and is based on forcing the equality of the 
grand potential of the atomistic part with the rest of the system. This is then reduced to provide a balancing force 
to the difference of pressure at the wished uniform density of equilibrium. Regarding this point, one of the main 
results of this paper is that the thermodynamic force and the thermostat perform a work in the transition region 
which balances the difference in chemical potential between the different regions. As a consequence (equilibrium of 
the chemical potential) the interpretation of the AdResS simulation as a Grand Canonical sampling is enforced even 
further. A crucial point of this work is the following: the AdResS scheme is not Hamiltonia n^^i^*^ ; however, under the 
hypothesis of fixing DOFs in the hybrid region for a statistical analysis, the atomistic and coarse-grained regions can 
be considered (in good approximation) Hamiltonian; this line of thought gives rise to the basic idea of the present 
study as it is presented in the next sections. Moreover RefsJ^^i^ also show that a space dependent interpolation of 
potentials, which would lead to a direct Hamiltonian approach, is neither satisfactory from the point of view of the 
physical consistency, nor for the computational efiiciency in improving in a systematic way the accuracy of the results 
in the AT region; instead this latter is one of the main results of this work. 

III. THEORETICAL CONSIDERATIONS 
A. The outline of the basic idea 

Here we denote the degrees of freedoms and number of particles in the atomistic region (AT), hybrid region (A) 
and the coarse-grained region (CG) by {xi, Ni), (a;2, -/V2) and (0:3, N^), respectively. Therefore, the target is to prove 
that the atomistic region is subject to the Grand Canonical statistics: 

p(a;i,iV,) = i_e''^*-^^-'^<T(^^) (5) 

Qi 

where the partition function Qi is defined by 

Qi=^ /da^ie'^^-^-'^O^) (6) 
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The marginal probability of finding Ni molecules in the AT region is: 

p(iVi) = y"da;ip(a;i,iVi) = -i-e^^^-^^QjVi (7) 
where Qni is the partition function for a canonical ensemble with Ni atomistic molecules: 

Qm.^ jdx,e-^^^>^^ (8) 

Let us consider p{xi,Ni) = p{xi\Ni) p{Ni). Then, from Eq. ([5|) and ([7]), the conditional probability p{xi\Ni) for a 
truly Grand Canonical ensemble turns out to be 

p(a;i|Ari) --^e-^^"^^"^^ (9) 

The key point of our argumentation is the following: we want to compare the distributions (in the various regions) 
of the AdRcsS simulation with the corresponding ones of a full atomistic reference system, which is ideally divided 
in subregions corresponding to the AT, A and CG regions of the AdResS set up. The first step in our procedure 
is to fix the number of molecules in the atomistic region and consider the conditional probability p(a;i|iVi). If the 
AdResS set up would sample the space in a Grand Canonical fashion, then this probability should be the same as 
the corresponding one of a full atomistic reference system, namely Eq. Then, if the probability of finding A'^i 
molecules in the atomistic region is also the same as the full atomistic reference system, namely Eq. ([7]), we can safely 
state that the atomistic region of AdResS samples configurations in a Grand Canonical fashion. In fact as underlined 
before, it must be noticed that a subsystem of full atomistic system is a natural Grand Canonical ensemble in the 
thermodynamic limit (see, e.g., Ref. Il4r ). The thermodynamic limit in our case is intended in such a way that at 
a fixed density, the size of both the AT and CG regions tend to infinite, and, at the same time, the ratio between 
the extension of AT and that of the CG regions goes to zero; as underlined before, in practical terms, in numerical 
simulation the system size does not go to infinity, however it can be large enough so that within a certain numerical 
accuracy the hypothesis of thermodynamic limit holds. 



B. The conditional probability density p{xi\Ni) 

To prove the above statement about p{xi\Ni), we divide it into two parts: 

p(a;i|7Vi) = E j p{xi\Ni-X2,N2)p{,X2,N2\Ni)6x2 (10) 

where p(xi|A''i; X2,N2) is the probability density obtained by fixing the coordinates and number of particles in the 
region A and considering the distribution of the DOFs in the AT region (see further clarifications below), while 
p(a;2, A''2|A^i) is the distribution in the hybrid region, conditional on the number A^i of particles in the atomistic 
region. We now comment on the underlying assumptions involved in the calculation of the conditional probability 
density: 

1. If we use the weighting function ([1]), then the AT region is interacting with the A region as if the A region were 
part of the AT region because all hybrid molecules interacting with the AT molecules have unit weight. 

2. We assume that p{xi\Ni; X2, N2) can be approximated by 

pixi\Nv,X2,N2) oce-''<T("^'"='^^) (11) 
where the Hamiltonian governing the physics of the molecules in the atomistic region is defined as: 

n%^^ix,;x2,N2)^J2lnW,+ Y.lu^^ir^-r,) + i: E ^^^^(^^ ^ (12) 

j=l i,j=l 1=1 ]=Ni + l 

which is exactly the same as for the equivalent subregion of the full atomistic system of reference. (The 
approximation is essentially based on the assumption that AT and A regions are only short-range correlated; 
this is discussed in the following point.) 
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3. We further assume that all interactions in the system have finite range with given cut-off radius; the electrostatic 
interaction is treated by the reaction field method. Specifically, we suppose that the AT region does not interact 
in a direct way with the CG region. For the case of liquid water at ambient conditions, it is reasonable to 
assume that the system is only short-range correlated, i.e. the AT region is only correlated with the A region. 
We will show numerically that this is indeed the case.) We suppose that all interactions satisfy the superstability 
condition of Ref. ITsl 

4. The hypothesis of fixed molecules in the A region, must not be intended in dynamical terms. This means that 
during the dynamical evolution of the system one should not suppose the molecules in the A region to be frozen 
in their positions while molecules in other regions are moving. Instead this hypothesis must be intended in 
the sense of statistical analysis; within a statistical framework it is intended that for a given configuration 
of molecules in the A region, the AT region explores a large (statistical) number of configurations. In such a 
case, the practical statistical analysis consists of taking a given configuration in the A region, then consider the 
whole trajectory of a simulation and sort out all the configurations in the AT region corresponding to the given 
configuration in the A region. If one repeats the process for a large number of fixed configurations of the A 
region, then the data of the trajectory sorted according to this criterion would lead to a statistics for the AT 
region equivalent to that obtained by a sampling performed according to the given Hamiltonian of the AT 
region (Eq fT^ . 

In accordance with the third assumption, we ignore correlations between the molecules of A and CG regions, in 
fact we even consider the DOFs in the A region fixed while sampling configurations in the AT and CG regions. The 
question then is, whether the probability p{x2, N2\Ni) in the A region is the same as that of the equivalent region 
in a fully atomistic reference system. In general this will not be the case, so we state necessary conditions to enforce 
such an equivalence to lowest order in the configurational probability of the system; the necessary conditions are^: 



The first order marginal distribution is the particle (or molecular) density pa(^'), while the second order marginal 
distribution is the radial distribution function (RDF), gA{r). The necessary conditions of the correct distribution 
p{xi\Ni) are that these two distributions should be the same as those of the fully atomistic reference system. These 
are the minimal necessary conditions involving the basic DOF's, that is the molecular center of mass coordinates, 
however one may require a more specific accuracy by imposing that also any atom-atom g{r) is matched to the 
corresponding function of the atomistic reference system. In general, the conditions of EqlT3] and EqlH] would assure 
that at least at the first and second order p{x2^ N2\Ni) in the AdResS is the same as that of a full atomistic simulation. 
One needs to go at least at the second order, so that at the interface between the atomistic and transition region, 
the radial distribution functions of the atomistic part are not affected by artifact due to the deviation of the RDF's 
of the A region from the correct atomistic reference. We have shown numerically how the RDF correction, can 
be numerically implemented^. Higher accuracy can be then systematically reached by enforcing the equivalence of 
higher orders of the distribution in the transition region. Next, we must show that p{Ni) is the same in AdResS 
and in the reference full atomistic simulation. The basic outline of arguments will be given in a following section 
while the (long) explicit calculations arc reported in the Appendix \^ however, before proceeding further we must 
first treat a key ingredient of this equivalence, that is the thermodynamic force. This force, in fact, together with 
the thermostat, is the crucial tool for assuring the thermodynamic and statistical equilibrium of the AdResS system 
when compared to the reference full atomistic simulation. In the next section, we will show how this force, derived 
on intuitive ground to enforce the equality of some basic thermodynamic relations, as shown in EqHl is, together 
with the action of the thermostat, the key ingredient for the balance of chemical potential between the various region 
at the desired density of equilibrium. The balance of chemical potential is implicitly a strong argument in favor of 
the idea of AdResS as Grand Canonical-like scheme (i.e. molecules are exchanged between the AT and CG region in 
conditions of equilibrium) . 



C. Thermodynamic force: from empirical intuition to strict formalization within the Grand Canonical framework 

Intuitive thermodynamic considerations led to formula In this section we show that, despite the argument 
used in EqH]from Refsi^i^ has a strong empirical component, one can formally justify this force as a tool to balance 
the chemical potential of the various resolutions and, as a consequence, a key aspect in interpreting AdResS as an 
effective Grand Canonical set up. Here, two assumptions are made: 



PA{r) = PAT{r) 
.9A(r) = gAT(f) 



(13) 
(14) 
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FIG. 2. Schematic plot of the AdResS system in thermodynamic equilibrium. The thickness of the filter corresponds to cJa- 



1. N2 Ni ^ 7V3: the second inequality corresponds to the thermodynamic limit of the Grand Canonical 
ensemble. The first one actually assumes that the A region is infinitely thin so that it can be viewed as an 
infinitesimal membrane that allows free exchange of molecules from the AT region to the CG region and vice 
versa. 

2. We have 

nixi,Ni;x3,N3)^H^^Jxi)+n%'^{x3). (15) 

The latter is reasonable if the system is short-range correlated. 

The equilibrium in the AT and CG region is assured by the membrane A via the action of thermodynamic force and 
of the thermostat. Conceptually, we assume there is an infinitely thin "filter" (see Fig. ^ located at the interface 
between the AT and CG regions. When a molecule enters into the AT region or leaves it, the filter does some work 
per molecule, ujo, on the atomistic system in order to assure the thermodynamic equilibrium. Therefore, we add an 
empirical term in the Hamiltonian of the system iViWo, related to the work done by the filter to obtain configurations 
of the atomistic region with A^i molecules. Having fixed the number of molecules, that is ideally considering case by 
case situations at fixed Ni,N3, and by following the arguments of the last section, both the AT and the CG regions 
are subject to the Boltzmann distribution. Thus, the fixed-number partition function (A'^i in the AT region and A^3 
in the CG region) of the system reads 



q{N,V,T) = ^ / da;e-^[«(^^'^^^^^'^^'+^i"°] 



Nl J ^ 



^^^e-^^^'^^QATiNu VuT) Qcg(^3, ^3, T) (16) 



Considering the permutations of particles, the number of possibilities of A''! molecules being in the atomistic region 

jV! 
N1IN3'. ■ 



and N3 molecules being in the coarse-grained region is p^^L , ■ Therefore, the partition function of the whole system 



reads 

Q{N,V,T)= f2 J^^l,^'J^'^ e-'''^'^°QATiN^,V^,T)QcGiN^,V^,T) 

Ni = l ^' ^' 

N 

= e'^''''"°QAT{Ni,Vi,T)QcG{N3,V3,T) (17) 

Afi = l 

with natural relations: 

iV = A^i + A^3 (18) 
V^Vi + Vs (19) 

For convenience, we denote 

gwi =e-'3^i"°QAT(iVi,l/i,r)QcG(iV-iVi,l/3,T). (20) 
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Further let Ni be the value at which q^i reaches its unique maximum, namely = max^jvi- (The existence of a 
maximum follows from the superstability of the interactions that guarantees that particles do not cluster as A'^ — ;> oo; 
we further assume that it is unique.) Since q^^ is positive definite, a basic observation is that 

N 

Qn, < ^^9jVi- (21) 

Afi = l 

Due to the monotonicity of the logarithm, we have 

N 

In <7>^ < In ^ QN, < HNQn, ) = In TV + In g^^ . (22) 

Ari = l 

If In -C IngjVii Laplace's method yields (see, e.g., Ref. [l6l Sec. 4.3) 

JV 

In ^ QN, « liigAfi, (23) 

Afi = l 

(The validity of the antecedent will be discussed later.) Hence 

In QiN,V,T)^ -pNiUJa + In Qat (iVi , , T) + In Qcg (^ - A^i , ^^3 , T) (24) 
or equivalently 

A{N, V, T) « iVic^o + AAT(iVi, Fi, T) + Acg{N - iVi, Fg, T) (25) 

Where A denotes the Helmholtz free energy. At this point, the crucial question is if the condition IniV <C hiqjvi holds, 
or equivalently 

In < -/37ViL^o + In Qat [Ni , V^i , T) + In Qcg ( A^ - , ^3 , T) (26) 

Generally this is true: \nQcG{N — Ni.Vz^T) is proportional to the free energy Acg{N — iVi, V3,T), which is an 
extensive thermodynamic variable, so Aqg(N — iVi, V3,r) is proportional to N — Ni. Due to the thermodynamic 
limit N ^ Ni, Acg{N ~ Ni,V3,T) is proportional A^, which is much larger than InA^ for A^ ^ 1; this validates 
condition (|26p . We will see later (from Eq. (jAll) ') that the maximum Ni corresponds to the maximum value of 
probability p{Ni); this is also the average molecule number in the AT region that is of statistical importance under 
the thermodynamic limit. As the right hand side of Eq. (j25|) attains its maximum at Ni when the other thermodynamic 
variables are kept fixed, differentiating it with respect to A'^i entails 

LOO = MCG ( A^ - iVi , T/g , T) - MAT (iVi , l^i , T) (27) 

This means that the difference in the chemical potential between the AT and CG regions is taken care by the work 
of the filter. Now, if the filter ensures an equilibrium that is the same as the full atomistic reference system, then: 

Wo = Mcg(poV^3, V3,T) - mat(poVi, Vi,r) (28) 

po is the number density at which the atomistic and coarse-grained resolutions should match, namely A^^i = paVi and 
N — Ni = po{V — Vi) should apply. Eq. ((28|) is a necessary condition for the work required to the filter in order to 
have the correct equilibrium of the AT and CG regions. 



As anticipated in the previous sections, the work of the filter has two components, one corresponding to the work of 
the thermodynamic force and the other corresponding to the work of the thermostat, i.e. wq — ^th + i^q. We indicate 
with wth the work of thermodynamic force and with wq that of thermostat in the transition region. The existence of 
wq has been numerically proven in Ref>ii and can be decomposed in two parts: wq = Wdof + Wq^*''^. Wdof is related 
to the thermalization of the degrees of freedom (rotational and/or vibrational) which are reintroduced/removed, and 
according to the equipartition theorem, one has ^ksT per degree of freedom, while uj^^'^'^ is due to the absence of the 
energy conservation that is related to the different intermolecular interactions in the different regions of the system 
as shown in Ref p^. In the Appendix |D] we will show how this term can be numerically calculated within a standard 
AdResS, thus allowing the explicit calculation of the chemical potential of a system. 
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We remind the reader that the argument in this section is essentially based on a large deviations principle (namely, 
Laplace's method) for the number of particles in the AT region that entailed that Ni rather than A^i can be considered 
as representative of the configuration realizations in the AT region. This implies that the particle fluctuations, AA^i, 
are negligible compared to Ni in the AT region. This hypothesis is valid if the AT region is large enough so that iVi 
is large. This point seems to be true in all numerical experiments done so far (see, e.g., Ref . [isl) . In this way, we have 
shown the formal derivation of the thermodynamic force and the thermostat as tools to balance the chemical potential 
between the two regions. For a Grand Canonical-like set up, the balance of chemical potential between open regions 
is a necessary condition to have the exchange of molecules in thermodynamic equilibrium and as a consequence we 
have formally justified why Eq|3]is a necessary condition for AdResS to be considered an effective Grand Canonical 
set up. 



D. The number probability p(Afi) 

In this section, we provide the basic arguments by which one can define at which level of approximation p{Ni) in 
AdResS is the same as in a full atomistic simulation. This is an important point in order to justify the reliability of 
AdResS as a Grand Canonical set up in terms of probability distribution. In fact if p{Ni) in AdResS is very different 
from the corresponding one of a full atomistic simulation, then clearly the AdResS method cannot be considered valid 
since the artifacts due to the different p{Ni) would give a non realistic description of the system. Essentially the 
arguments of this equivalence are two; the 1st order accuracy oi p{Ni), requires the balance of the chemical potential: 

A^CG = MAT + Wo (29) 

This, in the previous section, has been shown to hold because of the action of the thermodynamic force and the 
thermostat in the thermodynamic limit. Whether or not the size of standard and feasible molecular simulations can 
be considered, effectively, in the thermodynamic limit will be checked later on with numerical tests; however we can 
anticipate that the answer is positive for systems whose size and time of simulation are nowadays routinely done. The 
2nd order of accuracy, requires the equality for the compressibility: 

KCG = KAT- (30) 

Eal5Dl implies that the COM-COM RDFs (here, again, COM stands for the center of mass, and RDF stands for 
the radial distribution function) of the atomistic and coarse-grained region are matchedi^. Essentially these are the 
physical requirements to show that up to the second order p{Ni) in AdResS is the same as in the equivalent subregion 
of the full atomistic system of reference. Because of the lengthy arguments, specific details are reported in Appendix [Al 



IV. A NUMERICAL TEST: LIQUID WATER 



The arguments we have used so far are not sufficient to infer that the probability p(a;i, A^i), as presented in the 
theoretical analysis, corresponds to that of a Grand Canonical distribution (Eq. ([S])). One must be very careful about 
any statement on p(a;i, A'^i) due to the following three reasons: 

1. There is no hope whatsoever that the assumed Boltzmann form of the distribution p{xi\Ni; X2, N2) can be 
numerically verified for any realistic test system. 

2. Conditions (jl3p and (1141) are only necessary, and are in general not sufficient to guarantee the correct distribution 
p{x2, A''2|A^i) in the hybrid region A. 

3. Formally, the particle number density p{Ni ) is only a second-order approximation of the correct density pat (Ni ) . 

However, the number distribution p{Ni) has already been proved to be numerically correct in Ref. and|3 for the 
relevant case of liquid water. In this section, we want to test, with a numerical simulation, to what extent, other 
quantities, e.g. the configurational distribution p{xi) — X)jVi -P(^ii -^1)5 ^-re correct. Obviously it is a prohibitive task 
for any complex molecular system to determine the high-dimensional configurational distribution, for this reason, 
instead, we study its marginal distributions up to the third order. Here we report the H-0 and H-H RDFs (see 
Fig. [3]), and the 3-body COM correlation function C^'^^ (si, S2, S2) (see Fig. 0]). For simulation protocols and the 
definition of C^'^^ , see the Appendix |B] and [Cj The first order marginal distribution, that is the molecular density 
profile and the second order marginal distribution, that is the COM RDF are not presented here, we address the 
readers to Ref. 0. Here we focus on the more delicate RDFs which involve the atomistic accuracy, and on the third 




FIG. 3. Local H-H and 0-H g{r)'s. The red line is the curve corresponding to the reference explicit (all atom) simulation 
(EX). The curve obtained by employing the AdResS method is represented in green. The hybrid region is equally divided into 
three parts: HY I, HY II and HY III, the widths of which are roughly equal to the cut-off radius, i.e. 9 nm. The top part of 
each panel shows the region where the g{r) is calculated, and the value of weighting function w{x) there. From left to right, 
the panels correspond to the AT region, the HY I subregion of A, the HY II subregion of A, and the HY HI subregion of A, 
respectively. It can be seen that beyond 0.5 nm the functions go to one, that is particle beyond this distance are uncorrelated; 
this is fully consistent with our hypothesis of Ea llll 
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FIG. 4. The 3-body correlation function C^'^K The 1st row corresponds to the reference all atom simulation (EX). The 2nd and 
3rd row show the difference between the function calculated in AdResS in the AT, HY I regions, respectively, and the reference 
function of the full atomistic case. Different columns give different distances between the first two molecules: from left to right 
Si2 = 0.27 nm, 0.33 nm, 0.80 nm. The horizontal axis corresponds to the variable hi, while vertical axis is the variable h2 (see 
the Appendix [C] for the definition of these variables). The magnitude of the correlation and the differences are indicated by 
different colors. 



order COM three-body correlation function (C'-'^'). From Fig. [3] one can see that the AdResS H-0 and H-H RDFs 
are identical to those of the full atomistic reference system in the AT region. Interestingly, despite the corrective 
force of the RDF is applied only to the COM RDF, also in the region HY I (that is close to the atomistic region) 
the H-0 and H-H RDFs are the same as in the full atomistic case. This implies that the AT region is embedded 
in an environment where not only the COM-COM structural properties but also finer structural properties are the 
same as for the atomistic resolution. In the HY II and HI regions, the AdResS results obviously deviate from those 
of the full atomistic reference system, because the atomistic nature is decreasing as a molecule travels from HY I to 
HY III, so it must be expected that the orientational order is also fading away. In HY HI, the H-0 and H-H RDFs 
are almost structureless. Furthermore, we test C'-'^' in Fig. U] (for the definition of the three-body correlation function 
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FIG. 5. Density and fluctuation profile of the case of the WCA potential in AdResS. The results are compared with the full 
atomistic reference (denoted by "EX") and the standard AdResS (denoted by "std. AdResS"). The error bars indicate the 
statistical error up to a confidential level of 95%. 



C*-"^^ see Appendix [C|) . Interestingly, the three-body correlation is correctly reproduced by the AdResS in the AT and 
HY I regions. This is a further strong argument in favor of the fact that in AdResS the AT region is embedded in an 
environment that, at least up to the three-body COM correlation is the same as the full atomistic reference system. 
In the HY II and HY HI regions, the three-body correlation deviates from the full atomistic reference, in fact since 
the various (7(r)'s already deviates, one cannot expect a higher order correlation function to match (if not by chance) 
the full atomistic function of reference. Another important information of Fig. |3] and is that the hypothesis of 
short-range influence of a region over another employed in Ea llll is numerically justified and thus we can state that 
for the case of liquid water at ambient conditions the AT region is only short-range correlated with the rest of the 
system. 



V. WCA POTENTIAL INSTEAD OF ATOMISTIC-BASED COARSE-GRAINED POTENTIAL: COUPLING THE 
ATOMISTIC REGION TO A GENERIC SOURCE OF ENERGY AND PARTICLES 

The numerical test presented in the previous section shows how an atomistic model and its corresponding coarse- 
grained model, which resemble basic structural properties of the original full atomistic system, can be interfaced in 
an adaptive way and exchange molecules in a Grand Canonical fashion. However one may be tempted to think that 
the request of having a coarse-grained model which resembles as much as possible the structural and thermodynamic 
properties of a full atomistic model would be a necessary condition to have a proper exchange of particles between the 
different regions. If it was so, the idea of the adaptive scheme as a Grand Canonical tool would not be very general; our 
theoretical results instead suggest that it must be sufhcient to impose some given conditions in the transition region 
to have a proper exchange of energy and particles between the small atomistic region and the large coarse-grained 
reservoir. This means that if the theoretical considerations are correct, one should be able to show numerically that 
the proper exchange of energy and molecules is independent from the molecular model used in the coarse-grained 
region; that is, the filter of the transition region makes the atomistic region a Grand Canonical ensemble. For this 
reason we have carried out a further numerical test where the molecules in the coarse-grained region interact with 
a generic WCA potential which assures that only the density, but not the radial distribution function, of a liquid 
governed by this potential is the same as that of the liquid water (for the simulation set up see the Appendix [B|) . 

The left plot of Fig.[5]shows that the density profile of the WCA AdResS is perfectly matching the all-atom reference 
system; satisfying agreement is found also in the AT and HY I regions of the right plot of Fig. [5] which shows the 
molecular number fluctuation over the whole system. Fig. IH] shows the COM-COM, H-0 and H-H RDFs in the AT, 
HY I, HY II and HY HI regions. All the RDFs are compared with full atomistic reference system (EX). Finally, 
Fig. [7] compares the 3-body correlation C'-^-' with the atomistic reference system (this latter reported in the first row 
of Fig. |4]), and plots the difference. It must be noticed that in this case, on purpose, we want to investigate a "worst 
case scenario" and thus we did not apply the RDF corrective force in the transition region. In this way we can 
understand whether or not the corrective action of the thermodynamic force only, is sufficient from the numerical 
point of view to reproduce the Grand Canonical-like environment for the AT region. According to our theoretical 
framework, in absence RDF corrective force only the first order of the distribution (i.e. the density profile) is corrected 
by the thermodynamic force in the A region. However, numerically. Fig. [5] shows a second order accuracy (correct 
RDFs) in the HY I region as well. Instead, the 3-body correlation in HY I deviates from the full atomistic reference; 
this is the price we pay for this "worst case scenario" set up. Nevertheless, the accuracy reached in the transition 
region is sufficient in order to have, at least up to the second order, the AT region of the WCA AdResS equivalent 
to the corresponding subrcgion in the full atomistic reference. Furthermore, numerical results (Fig. [7]) show that, in 




FIG. 6. The COM-COM, H-O and H-H RDFs. The red hne is the curve corresponding to the reference exphcit (aU atom) 
simulation (EX). The curve obtained by employing the WCA AdResS method is represented in blue. The hybrid region is 
equally divided into three parts: HY I, HY II and HY III, the widths of which are roughly equal to the cut-off radius, i.e. 9 
nm. The top part of each panel shows the region where the g{r) is calculated, and the value of weighting function w{x) there. 
From left to right, the panels correspond to the AT region, the HY I subregion of A, the HY II subregion of A, and the HY III 
subregion of A, respectively. 
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FIG. 7. The 3-body correlation of WCA AdResS in the AT and HY I regions. The difference with respect to the full atomistic 
reference (see the fist line of Fig. [4]) is plotted in this figure. From left to right, the distance between the first two molecules 
are Si2 = 0.27 nm, 0.33 nm, 0.80 nm, respectively. 



the AT region, even the 3-body correlation C^"^^ is nonetheless very accurate. This means that, from the numerical 
point of view, even only the work of the thermodynamic force and that of the thermostat, by allowing an exchange 
of particles under conditions of equilibrium of the chemical potential, is sufficient to assure the correct equilibrium of 
the atomistic region. The possible application of the corrective force on the RDF's in A in the case of liquid water 
is, in this case, numerically not required, but in case of necessity it would anyway represent a systematic tool for 
improving accuracy. Summarizing, results reported in Figs. [SJ [Hand [7] show that in the atomistic region the accuracy 
at the third order and particle number fluctuations, for which the filter in the transition region is designed, arc well 
preserved; that is the atomistic region exchange particle and energy with the reservoir in a proper Grand Canonical 
fashion (up to the level of approximation decided a priori) . Being the molecules in the coarse-grained region a generic 
liquid model, we have proved numerically that indeed the adaptive technique employed here can be considered as a 
tool for general Grand Canonical Molecular Dynamics simulations. 



VI. COMPUTATIONAL EFFICIENCY OF ADRESS AS A GRAND CANONICAL SET UP 



In order to show the computational efficiency of the AdResS method as a Grand Canonical set up, we have carried 
out a numerical experiment to compare the performance of our method in inserting and removing molecules from the 
atomistic region and that of a highly popular approach used in Molecular Dynamics, the insertion particle method 
(IPM)^. The IPM is very efficiently implemented into the GROMACS code^i and thus it is a natural choice as a 
Grand Canonical-like set up for a very large portion of molecular dynamics simulators. The experiment consists of the 
following: we have considered a large full atomistic system and applied the highly optimized IPM to insert molecules. 
From this study we extract the efficiency of the IPM in terms of computational resources required. Next, we consider 
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Sys. size (nm"^) 


Traj. length (ns) 


CPU time (hours) 


AdResS 
EX water 


29.9 X 3.7 X 3.7 
3.7 X 3.7 X 3.7 


1 

8 


3.1 

4.5 (traj.) + 36.7 (IPM) 



TABLE I. Comparison of computational efficiency. The AdResS simulation contains 13824 molecules, while the full atomistic 
(EX) water simulations contain 1728 molecules. In the AdResS simulation, the size of the AT + HY regions (6.7 x 3.7 x 3.7nm'') 
is even larger than that of the EX water simulation box, so that we are actually in a "worst scenario" situation. Moreover 
simulations with smaller reservoirs gave essentially the same results reported for this system. "Sys. size" means the size of 
the box used in the simulation. "Traj. length" means the equilibrium trajectory used for the particle insertion in the IPM 
approach. Along the trajectories, frames of configurations were recorded every 0.2 ps. "CPU time" the is wall clock time 
spend on the simulation. For the particle insertion simulation, the CPU time is counted by two parts: the time of generating 
the equilibrium trajectories (traj.) (not expensive) and the time required by the IPM procedure (very expensive). Moreover, 
not only the insertion procedure is expensive on itself, but even after 8ns the insertion process has not actually converged. 
Simulations on longer time scales show that the full convergence is actually never reached, which suggests that the insertion 
procedure for a molecule as simple as water is not fully rigorous from the physical point of view. 



an adaptive system whose number of molecules, in the atomistic plus hybrid region, is at least the same as the full 
atomistic system considered (actually the for results presented here the size considered is even larger than that of the 
EX simulation); to this we add a rather large reservoir of coarse-grained particles (tests were carried out also with 
smaller reservoirs but the results were essentially the same as for the large reservoir). Also in this case the efficiency 
in measured terms of computational resources required. The results show that the IPM procedure itself requires a 
rather high computational cost, to which it must be added long trajectories so that the insertion of one molecule 
per time converges and data can be then collected for real statistics. Instead, the adaptive approach performs, in 
less time and at a much lower computational cost, on-the-fly dynamical multiple insertions keeping the instantaneous 
equilibrium intact, which implies the additional advantage that data can be collected on-the-fly for real statistics at 
any time. Specifically, we used an equilibrium trajectory of length 8 ns where the coordinates of water molecules were 
recorded every 0.2 ps. Using the IPM approach, in each frame, 10^ test particles were inserted at random position of 
the system. For each insertion, 10^ randomly chosen orientations were tried. The convergence is very slow, even at 
8 ns the insertion process is not satisfactorily converged (see Fig|S]), this means that up to this point data cannot be 
collected for analysis of physical properties. Compared to IPM, AdResS leads a very efficient insertion of particles: 
in each 1 ps time interval, (on average) 23 water molecules entered into the atomistic region. Some of them came 
immediately back, because of a unfavorable entering configuration. To investigate the proper insertion, we consider 
the time interval of 1 ns, and find 832 out of 13824 molecules were firstly in either the HY or CG region, and then 
travelled to the AT region and stayed there for longer than 60 ps, which indicates a mean square root displacement 
of roughly 1 nm in the AT region. Moreover, while for the EX simulation one must avoid size effects by choosing 
reasonable large systems, in AdResS the AT region can actually be chosen to be much smaller than the one used here, 
thus enhancing the amount of efficiency. Finally, even if one can afford a rather large atomistic system, thus a faster 
convergence, the cost of the insertion procedure will increase enormously because it would require a larger sampling 
due to the increased number of possibility to allocate a particle in the liquid. These results in terms of computational 
resources are summarized in Table HI Moreover, in previous work, Praprotnik et al^ have shown that the coarse- 
grained region of AdResS can be easily coupled to the continuum; this has the non trivial advantage that the insertion 
of a sphere in a liquid of spheres by IPM is by far much simpler than inserting an atomistically structured molecule in 
a liquid of atomistic molecules. FigIS] shows the extremely fast convergence for inserting a particle in our "reservoir" 
of spherical particles compared to that of a full atomistic system. This means that in the long term perspective the 
part of the algorithm of Praprotnik et al. could be merged with our current approach and thus allow for even smaller 
sizes of the reservoir region. However, this extension, though conceptually straightforward goes beyond the scope of 
this paper. In general, there have been a large number of developments for the GC idea based on the "grow-in" and 
"shrink-out" MD approach (see e.g. Lynch and PetittiS and Eslami and Miiller-Plathoii) however, all of them share 
the same basic principle and face the same problem of the IPM. This numerical experiment shows that despite the 
need of a reservoir, our computation is by far more efficient than any other technique so far used. The comparison 
between the performance of AdResS and IPM as a Grand Canonical set up, leads to the natural question of whether 
or not our method can be employed to calculate the excess chemical potential which is the primary purpose for which 
1PM is used. If this is possible, then AdResS may represent a more efficient computational tool than IPM also for 
calculating such a quantity. In Appendix [D] we provide a practical scheme for using AdResS as a tool to calculate the 
chemical potential and show that the results are in rather satisfying agreement with those obtained by IPM. 
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FIG. 8. The chemical potential calculated by the insertion particle method (IPM). The top panel shows the case of coarse- 
grained (CG) water and WCA liquid while the bottom panel shows the case of a full atomistic (EX) simulation. The horizontal 
dashed line in the bottom panel is used to guide the eyes and represent the ideal target. The plot shows the result that for 
EX water the IPM procedure does not converge within 8 ns; in fact the convergence of the chemical potential means that the 
insertion of the molecule is properly done and subsequent part of the trajectory (with the associated IPM) can be used for 
"statistical" average in a Grand Canonical framework. 
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VII. CONCLUSIONS 



Wc have shown a detailed theoretical analysis about the validity, the limitations and meaning of the AdResS method 
within the framework of a Grand Canonical ensemble. Using strict formal arguments, wc have demonstrated the role 
of the thermodynamic force and of the thermostat in balancing the difference in chemical potential due to the different 
resolutions in space. The theoretical results, derived under the assumption of thermodynamic limit, are then checked 
with several numerical tests. These latter represent a "worst scenario" case since the conditions of simulation are 
not ideal as in the analytical procedure. Next, we have shown that, under given hypothesis, p{Ni), the probability 
distribution in the atomistic region of AdResS, is equivalent to that in the same region in a full atomistic simulation 
up to the accuracy of second order. We have further strengthen our hypothesis and conclusions by carrying out a 
numerical experiment for the case of liquid water at ambient conditions where we could even go beyond the COM 
RDF. A further numerical experiment was made where the atomistic-based coarse-grained model was substituted by 
a generic liquid of spheres interacting via WCA potential. Results show that the accuracy in the atomistic region 
is the same as that of AdResS based on a coarse-grained model derived from the full atomistic reference system; 
this strengthen the idea of AdResS as a tool for coupling the atomistic region to a generic source of energy and 
particles in a Grand Canonical manner. Finally we validate the efficiency of our method as a Grand Canonical set 
up comparing its computational performance to that of well established and largely used alternative methods. In 
conclusion, these results provide a solid theoretical basis to explain the numerical reliability of the AdResS as an 
effective Grand Canonical set up and provide basis for further formal and numerical development of the adaptive idea 
in terms of matching probability distributions. 
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Appendix A: The number probability p{Ni) 



In this section, we define at which level of approximation p{Ni), in AdResS is the same as in a full atomistic 
simulation. Let us consider the marginal distribution p{Ni) 



m 



da;ida;g 



Q{N,V,T)N\ 



.-/3[«NT(a:i)+WN3(a;3) + Ni"n] 



TViITVg! Q{N,V,T)N\ 



e-^^^-^«QAT{Ni,Vi,T)QcG{N - Ni,V3,T) 
E„, e~0"^^<>QAT{ni,Vi,T)QcG{N - V^,T) 



(Al) 



where ni is the summation variable that goes from to N . In any case, when ni is not much smaller than N (i.e. 
out of the range of the thermodynamic limit), the statistic is not relevant, and can be safely neglected. The marginal 
number distribution of the full atomistic simulation, which the AdResS simulation should reproduce is 



{Ni,Vi,T)QATiN - Ni,V3,T) 



En, QAT(ni,Fi,T)QAT(iV -ni,V3,T) 



(A2) 
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We want to calculate the difference between p(iVi) and pat (-^i); that is the difference between the particle probability 
in the atomistic region of AdResS {p{Ni)), and the particle probability in a full atomistic simulation in the region 
equivalent to the AT region of AdResS. In order to do so we proceed with the following technical operation: multiply 
the numerator of p{Ni) by the denominator of pat(A^i) (denoted by Ti), and multiply the numerator of pat(A^i) by 
the denominator oi p{Ni) (denoted by T2). It follows: 

Ti = e-''^i'"°QAT(A^i, Vi,T)QcGiN - A^i, "t^s, T) x ^ QAT{ni,Vi,T)QAT{N - m, ^3, T) (A3) 

ni 

T2 = QAT{Ni,Vi,T)QATiN ~Ni,V3,T)xJ2 e-''"^"°QAT(ni, l^i, T)gcG(iV - m, V^g, T) (A4) 

ni 

The difference between Ti and T2 is basically the difference between p{Ni) and pat(A^i)- Calculating Ti: 

Ti=J2 ^'^'''""QATinu VuT) Qat{Ni,Vi,T) QatIN - m, ^3, T) QcciN ~ Ni,V3,T) 
■111 

= ^ exp { -(3[luoNi + AATini,Vi,T) + Aat{Ni, V^T) + Aat{N - m, V^a, + AcciN - A^i, ^3, T)] } 
"1 

(A5) 

Notice that the free energy is extensive. By applying Euler's theorerr^ we thus find 

Aat{n - m, F3,r) = V3Aat{ \ = i^3^at(po + \, , (A6) 

Acg{N ~ N,,V3,T) = VsAcci ^ ^/^\ l,T) = VsAcciPo + ^'~^\ l,T), (A7) 

where Ni is the number of molecules in the atomistic region at the wished density po of equilibrium, namely iVi — 
PoVi = N — P0V3. As we work under the hypothesis that the atomistic region is much smaller than the whole system, 
it is reasonable to assume jVi ^ N. Moreover we have found that, in the thermodynamic limit, iVi iVi ^ Ni 
which carries over to the running index 711 in the grand canonical partition function that is non- negligible only if 
m TVi ~ Ni. Hence Aat{po + {Ni - ni)/V3, 1,T) and Acg{po + (Ni - Ni)/V3, 1,T) can be regarded as the free 
energies per unit volume, with the density perturbed from po by (TVi — ni)/V3 and {Ni — Ni)/V3, respectively. These 
perturbations originate from the fluctuating number of molecules in the atomistic region: when the number deviates 
from A^i, molecules enter into the coarse-grained region and the density is perturbed. In the thermodynamic limit, Ni 
is comparable to ni and iVi, and all of them arc supposed to be much much smaller than while V3 is of the same 
order of magnitude of V (the large CG region compared to the small AT region). As a consequence the perturbations 
(Ai — ni)/V3 and (A^i — Ni)/V3 are small, compared to po ~ N/V and Taylor expanding the free energies (jA6p and 
(|A7P about Po yields 

AAT{po + ^T^,l,T)=AAT{po,l,T) + t,AT[^T^)+:^2 (^ir^) +«(!) (A8) 

1/3 v 1/3 / 2p^ KAT ^ V3 y 

^cG Po+ \, \ i,rHAcG Po,i,r +/icG \, ) + 7r2 \. +0 I A9 

V3 \ V3 J 2/9g KCG ^ V3 J 

(We use the Landau notation o(l) to collect all terms that are asymptotically negligible, bearing in mind that in the 
thermodynamic limit Ai ~ Ai ; sec also Ref. 0) . By using the conditions ([2^ and (pOI , we have Ti 

Ti^Y. { - /?[^AT(ni, "^^1, T) + Aat{Ni,Vi,T) + Aat{poV3,V3, T) + Acg(po1^3, V3, T) 

2po I^AT 



V3 r / A'l - ni \ 2 / TV, _ AT, X 2-, 1 
+ {pAT + PcG)m-pAT{m+n,) + -^^ (^rr^) +( \, M +0(1) I (AlO) 

2pn KAT V3 / V V3 / J J J 



Similarly, we calculate T2: 

T2 = J2 Qat (r^i , , T) QAT(iVi , , T) QcG ( A^ - rii , ^3 , T) Qat{N - A^i , F3 , T) 

"1 

= ^ exp { - l3[uoni + AAT{ni,Vi,T) + Aat(A^i, Vi,T) + AcoiN -ni,V3,T)+ Aat{N - A^i, F3, T)] } 
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FIG. 9. The schematic plot of the 3-body correlation function. 



with similar expansions and by the conditions Eq. (|29p and (j30p . we have: 

= ^ cxp { - /3 [Aat (ni , Fi , T) + Aat (iVi ,Vi,T) + Aat {poVs , , T) + Acg (poVs , ^3 , T) 



+ (a^AT + PCg)Ni - PAT {Ni + Til) + 



2/Oo KAT 



Ni- Ni\2 



V3 



0(1)]} 



(All) 



Since the difference between the probabilities p{Ni) and pat(-^i) is basically the difference between Ti and T2 
renormalized by the partition functions of the distributions, by invoking Laplace's method again, we see p{Ni) and 
Pat(.^i) agree up to the second leading order with respect to the density perturbations (Ni — rii) /V3 and {Ni—Ni)/V3, 
in the limit V3 ^ 00. 



Appendix B: Simulation protocols 

The testing system contained 3456 SPC/B^ water molecules in a 7.50nm x 3.72nm x 3.72nm periodic box. The 
system was divided along the x direction into one atomistic region of width l.OOnm and one coarse-grained region 
of width l.OOnm connected by two hybrid region of width 2.75nm. The hybrid region was equally divided into three 
sub-regions (with width G.9nm) to calculate the local RDFs. These sub-regions are called HY I, HY II and HY 
III, from the AT side to the CG resolution side. The simulation was made at room temperature of 300K. The 
time step was At = 0.002ps. The cut-off radius rc used for all interactions was 0.9Gnm. The electrostatic interaction 
method used for the atomistic region was the reaction field method. All simulations were performed by MD simulation 
software Gromacs^i and VOTCA^^. In the WCA-based simulation, the interaction potential between particles in the 
coarse-grained region is given by 

r<2i/V (Bl) 

The parameters used in this study are e — 0.65 kJ/mol and <t ~ 0.30 nm. These parameters are chosen such that the 
soft core of WCA is roughly the same as the structure-based coarse-grained water model. Other parameters used in 
the simulation are the same as the standard AdResS. 



Appendix C: The third order correlation function 

We consider the following definition of three-body correlation function of a molecular system: 

C(3)(S1, S2, S3) = , , / , , M p{Sl) - P{S1)) ■ ip{S2) - P{S2)) ■ {Pis3) - Pis3))) (Gl) 
p{Si)p(S2)p{S3)^ 

where the transient density distribution p(s) and the average density distribution p(s) are given by: 

N 

/i(s) = ^,5(s-r,) and p{s) = {p{s)). (G2) 
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By assuming that the system is homogeneous, one easily can show that 



C^^\si,S2, S3) = — p{si,S2, S3) [p(si, S2) + S3) + p{s2, S3)] + 2 



(C3) 



where 



p(si,S2) = (p(si)/5(s2)) and p(si, S2, S3) = (/5(si) p(s2) ^(sa)) 



(C4) 



Since g{si, S2) = p(si, S2)l is the two-body radial distribution function, it follows: 



C(^^(si, S2, S3) = — p(si, S2, S3) - [g(si2) + 5(513) + g(s23)] + 2 



(C5) 



where, for example, syi = |si — S2I. 

Since we assume the system to be homogeneous, the visualization of this high-dimensional distribution is simplified 
as follows. We first fix the distance between molecule 1 and 2 (Mol 1 and 2), then plot the correlation function with 
respect to the position of the third molecule (Mol 3, see Fig. The position of Mol 3 can be described by the 
variables h\ and /i2, which are the projection of Mol 3's position on the to the axis S12 defined by Mol 1 and 2. 

Appendix D: Estimate of the excess chemical potential by AdResS 

In this paper we have proven that the work of a thermodynamic force and the work of the thermostat in the 
transition region, which are the ingredients needed to provide density equilibrium across the system, correspond to 
the difference of the total (kinetic and excess) chemical potential between the atomistic and reservoir region. Thus 
in principle the calculation of the total work done by the thermodynamic force and that of the thermostat would 
allow for the determination of, at least, differences of total chemical potential, and, when the kinetic part is properly 
removed, of differences in the excess chemical potential. While the calculation of the work of the thermodynamic 
force, wth implies simply the straightforward calculation of the integral of such a force over the transition region, the 
work of the thermostat , wq, for allowing the proper change of resolution of the molecules across the transition region 
is somewhat a much more delicate issue. In essence, the calculation of the chemical potential, would require the 
accurate determination of Wq^'''', which unfortunately cannot be done in a direct and efficient way. However, we have 
derived a procedure by which such a quantity can be calculate in an indirect way without additional computational 
efforts in a standard AdResS simulation, in the following part we illustrate such a procedure. 

1. Auxiliary Hamiltonian Approach 

We define a system characterized by an atomistic region, a transition region and a coarse-grained (reservoir) region, 
as in a standard AdResS. We couple the different resolutions of the system via a Hamiltonian where the total potential 
is given by the interpolation of atomistic and coarse-grained potential in the same fashion of the force interpolation 
of standard AdResS: 



Next we determine the thermodynamic force for this system which leads to the same equilibrium density as that 
of the standard AdResS. We indicate such as force as i^^, and the superscript H is used to distinguish it from 
the thermodynamic force calculated with the standard AdResS, which we have indicated as i^th- We have now the 
following situation; in the approach based on the Hamiltonian, the changing of resolution in the transition region at 
equilibrium is provided by the work of f ^ (here indicated as w^) and by the work of the thermostat to thermalize 
the reintroduced/removed degrees of freedom Wdof- Instead, in the standard AdResS, we have the same equilibrium 
density of the Hamiltonian approach, but the change of resolution is provided by i^th, '^dof and uf^^^^ . Since we have 
the same equilibrium and Wdof is the same in both cases, it follows that: 




(Dl) 



extra 



(D2) 



'Q 



The limitation of this procedure would be the fact that Wq^*'''^ (and thus the chemical potential) could not be calculated 
within a standard AdResS simulation and one would require an auxiliary simulation, although the calculation is anyway 
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FIG. 10. The difference between the thermodynamic force of the potential interpolation-based scheme and force interpolation 
of the standard AdResS. The green solid line shows the difference of the thermodynamic forces, i.e. Ai^th = J'th ~ Fth- The 
blue and red symbols indicate the quantity (k;Vui()7^^ — U^'^)) calculated in the two different approaches (blue corresponds to 
the standard AdResS while red to the Hamiltonian-based approach) . The equivalence between the two approaches is suggested 
by the fact that AFth corresponds to (wVu'(l7^^ — U''^^)), and that this latter is actually the same independently from the 
approach used to calculate it. 



not computationally demanding. However, based on an extended number of numerical tests, we could develop this 
idea further and calculate it by the standard AdResS simulation. To do so we have numerically studied the quantity 
AFth = -Fth ~ -^th for an extended number of systems, and proved that this is equal to {wT/w{U-^^ — t/*^*^)) (in 
FiglTOl is reported the case of liquid water). This quantity can be easily calculated without additional costs in a 
standard AdResS simulation, so u^^'^^ is calculated by 



The quantity —{w\7w{U^'^ ~ U'^^)) is the average of the additional force (with respect to the standard AdResS) one 
would have if the intermolecular force is calculated from Ea lDll as it happens in the Hamiltonian approach. This 
provided some physical arguments for the equality with AJ'th (besides the numerical evidence employed by us in first 
instance). In fact, if one writes down the average total force acting in the Hamiltonian approach and that in the 
standard AdResS, the difference between them is: Ft\,-F\^{w^w{l]'^^ -U'^^)). Since AJ'th = {wSI wilJ -U'^^)) , 
this implies that the total average force acting in the two approaches is the same, which further implies that the two 
approaches are essentially equivalent from a numerical point of view as one would expect since they have the same 
equilibrium. However, this procedure is based mainly on practical considerations and numerical experiments, thus at 
this stage it must be considered by the reader as a preliminary procedure although the essential ingredients seems to 
be already well defined. Instead, an elegant procedure for adaptive simulation based on the interpolation of potentials 
and rigorous thermodynamics considerations has been recently presented by Potestio et al?^ . Finally, as already 
stated before in this work, the approach based on the Hamiltonian shall not be considered a proper Hamiltonian 
approach to adaptive resolution, but only an auxiliary tool to determine in simple way a specific quantity of interest. 
In fact, if this is considered a proper Hamiltonian approach, one would have the physical inconsistency underlined in 
Ref.'^^, that is the inevitable fact that the total potential (i.e. interpolated potential plus the potential corresponding 
to the thermodynamic force) in either the coarse-grained region or the atomistic region is not that of the original 
system (as it should be in a proper adaptive Hamiltonian procedure). However, in our view, the most important 
limitation is that, as shown in the current work, in this case the numerical accuracy can be assured only at the first 
order and there would not be a systematic way to improve it. This is the difference with the force interpolation-based 
method proposed here, where corrective forces in the transition region impose numerical correctness of the probability 
distribution of the system at higher orders. At this point, according to the procedure presented above, we have all the 
ingredients to calculate the excess chemical potential using AdResS and compare it with the same quantity calculated 
with the 1PM, this is done in the next section. 




(D3) 
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2. Calculation of ^exc and numerical experiments 



For the difference of tlie total cfiemical potential between coarse-grained and atomistic region we have: 

^AT _ ^CG ^ f p^^^^^ ^ ^oxtra ^ ^^^^^ 



The excess chemical potential is defined by removing the kinetic contribution from the total chemical potential, i.e. 
Mcxc — IJ- ^ A*kin, ifJ-kin is the kinctic part of the chemical potential, which can be calculated analytically). It follows 

that Ai^T _ ^CG ^ J p^^^^ ^ ^oxtra ^ ^^^^ _ ^^^.^ ^ J^p^^ ^ {w\/w{U^^ - U^^))]dx + Wdof " A/Xkin- Ncxt, by 

knowing which can be calculated a rather low cost with IPM, one obtains whose calculation is instead 

very expensive with IPM. 

We have carried out several numerical tests interfacing spherical systems with different representation; we have found 
always a very satisfactory agreement between the chemical potential calculated with AdResS and that calculated with 
IPM. More importantly, we have applied this idea to liquid water and found a value of —22.8 kJ/mol with AdresS and 
a value of — 24.6kJ/mol with IPM (still not fully converged after about 400 ns), which must be compared with the 
value from literature of —23.5 kJ/molii. The difference between the value from our approach and that from IPM is 
about 8%, which is not significant, but the computational costs of the IMP is much larger than that based on the use 
of AdResS. 



